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This project aims to develop a vision system that can detect traffic light 
counter and to recognise the numbers shown on it. The system used you only 
look once version 3 (YOLOv3) algorithm because of its robust performance 
and reliability and able to be implemented in Nvidia Jetson nano kit. A total 
of 2204 images consisting of numbers from 0-9 green and 0-9 red. Another 
80% (1764) from the images are used for training and 20% (440) are used for 
testing. The results obtained from the training demonstrated Total 
precision=89%, Recall=99.2%, F1 score=70%, intersection over union 
(loU)=70.49%, mean average precision (mAp)=87.89%, Accuracy=99.2% 
and the estimate total confidence rate for red and green are 98.4% and 99.3% 
respectively. The results were compared with the previous YOLOvS5 


Tr atfic counter algorithm, and the results are substantially close to each other as the YOLOvS5 
Traffic light accuracy and recall at 97.5% and 97.5% respectively. 
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1. INTRODUCTION 

Road safety is globally recognised as one of the most significant problems that need to be 
appropriately addressed. Driving through the red light is one of the most common causes for accidents that 
occur at intersections. According to a research conducted by the insurance institute for highway safety (IIHS), 
violations of traffic lights resulted in around 928 fatalities and 115,741 injuries on the highways of the United 
States in 2020 [1]. Regrettably, most of the victims in the fatality or those with serious injuries were in good 
health prior to the accidents. Hence, traffic lights are critical in ensuring the safety of especially urban roads. 
Several studies on traffic safety have been conducted that looked into the different components of the 
system [2]-[4]. 

The detection of traffic light counters on the road is critical for the safety of drivers, whether it is 
autonomous vehicles or standard cars. The perception system, which gives the vehicle the ability to observe 
and comprehend its surroundings, is a fundamental part of an autonomous automobile. It is possible to remark 
that the development of autonomous vehicles has been motivated by a desire to cut down on the number of 
accidents that take place worldwide. The detection of traffic light counter by an autonomous vehicle is an 
essential kind of perception since it is critical for the control that the autonomous vehicle must perform, whether 
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it is to reduce the speed and stop at the traffic light junction or continue driving and cross the intersection. 
Furthermore, if the driver is unfamiliar with the traffic light signals, a system that aids them in seeing the details 
of traffic light signals or helps them to take actions based on the remaining time that is shown on the counter 
of the traffic light which in turn is very important and might be crucial in a sensitive driving manoeuvre (for 
instance, crossing an intersection) [5]. 

The aim of this research is to design and develop a system that detects the traffic light counter and 
classifies the numbers (0-9) and their colour (red or green) on the counter, and to compare the results of the 
you only look once version 3 (YOLOv3) algorithm with the YOLOv5 algorithm in different aspects such as 
accuracy and confidence rate. Moreover, this research is focused on the classification of the traffic light counter 
from zero to nine only. The classification is only for the numbers with colours which are (red and green) and 
lastly, the detection system for the traffic light counter is performed in daytime only. 


2. RELATED WORK 

This section will review related literature on traffic light counter detection. Although autonomous 
vehicles have been profoundly studied, most of the research conducted focussed on road signs and traffic lights 
without including the traffic light counter. The study by Bascon et al. [6] focused on road signs where the 
detection and recognition were based on support vector machines (SVMs) and the system proved to be accurate 
and reliable. Furthermore, in [7], the process for detection and recognition utilised the illumination conditions 
and multi-exposure images and it was also based on an SVM classifier. Although the results of the system were 
accurate and reliable, the SVM classifier is however quite old and not very useful for current detection and 
classification. Therefore, convolutional neural networks (CNN) [8] is more relevant for contemporary 
conditions and has been used for applications in traffic lights, traffic signals and traffic light counter detection. 

Meanwhile, Muller and Dietmayer [9], and Li and Zhou [10] have used single-shot multibox detection 
(SSD) for traffic light detection. They utilized the DriveU traffic light dataset [11] and results from the research 
were at 95% recall for small objects and up to 98% recall for larger objects while the false positive rates were 
between 0.1 and 1. It was also demonstrated by Jensen ef al. [12] that using YOLO [13]-[15] with the 
laboratory for intelligent and safe automobiles (LISA) traffic light dataset [16] and logistic activity recognition 
challenge (LARa) traffic light dataset [17] had produced 96.38% recall for YOLOv3, 68.06% recall for 
YOLOvz?2 and 42.3% recall for YOLOv1. Another research [18] used faster region based convolutional neural 
networks (R-CNN) [19] and LISA traffic light dataset [16] and Bosch small traffic light dataset [20] and the 
results achieved were 56.31% mean average precision (mAP) on the Bosch dataset and 76.37% mAP on the 
LISA dataset. 

All the mentioned studies did not include the traffic light counter but rather the traffic light signals 
only. Other research had used deep learning and YOLO for different purposes [21]-[38]. However, the study 
by Chand et al. [5] used mask R-CNN [29] and was specifically for the countdown timer of the traffic light. 
The dataset used were microsoft common objects in context (MS COCO) [30] and street view house numbers 
dataset (SVHN) [31] with the acquired result of 82.2% precision and 82.78% recall. Based on the review of 
past research, it is clear that multiple researchers had worked on traffic light detection and recognition systems 
[32] and compared multiple algorithms to decide the best method for traffic light detection and classification 
[33]. Nevertheless, this does not happen to the detection and classification of the timer counter on the traffic 
light. Therefore, this paper will present the method to do the detection and classification of the counter, and 
subsequently compare the performance of the results with two different algorithms which are the YOLOv3 and 
YOLOvS. 


3. METHOD 

This project employed deep learning method with the YOLOv3 algorithm. When a photo is taken, 
this algorithm identifies and recognises the numerous items in the image (in real-time). Object detection in 
YOLO is accomplished using a regression problem, which results in the generation of class probabilities for 
the pictures that were discovered. CNN are used in the YOLO method to recognise objects in real-time. When 
it comes to object detection, the approach just needs a single forward propagation through a neural network, as 
implied by the name. This indicates that a single algorithm run is sufficient to anticipate the content of the 
whole image. It is used to forecast several class probabilities and bounding boxes simultaneously using a CNN 
algorithm. 

The dataset is a video collection of traffic lights with counter taken via a smartphone camera around 
the city of Melaka, Malaysia and the videos were split into multiple frames per second to acquire a total of 
2,204 frames with 1,764 (80%) were used for training. Another 440 (20%) were used for testing. The flow 
chart of the system building and training process is illustrated in Figure 1. 
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The dataset was labelled manually and individually via computer vision annotation tool (CVAT) 
which is an online platform. The dataset was then set into 20 classes (0-9 red and 0-9 green). Consequently, 
the YOLOv3 algorithm was trained on Google Colab platform using Python. For the training process, the 
maximum batch value was set to 40,000 and the filters to 75. Figure 2 shows a sample of the used dataset for 
training. 


Training the network for 
YOLOv3 


Collecting Dataset 


Labelling each image 
accordingly 


Matching the 
annotation with the 
images and divides into 
testing and training 
using ROBOFLOW 


Figure 2. Sample of the dataset 


4. RESULTS AND DISCUSSION 
4.1. Training output 

Upon completion of the training process, some results can be obtained automatically using Google 
Colab that will make the trained module tests itself on the testing dataset. Table 1 shows some results obtained 
after the completion of the training process. The Precision and Recall were calculated by using (1) and (2). 


Precision = —/— (1) 
TP+FP 
Recall = —— (2) 
TP+FN 
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Table 1. Results from YOLOv3 algorithm 


Colour 
RED Green 
True positive False positive _ Precision% True positive False positive _ Precision% 
1 14 8 63.63 41 4 91.1 
2 16 0 100 19 s) 79.1 
3 16 0 100 19 0 100 
5 4 16 0 100 19 5 79.1 
= 5 19 0 100 13 0 100 
5s 6 34 0 100 19 5 79.1 
aa 28 0 100 14 0 100 
8 20 8 71.4 21 5 80.07 
9 10 0 100 9 1 90 
0 20 5 80 9 5 64 
Average precision% 91.5 Average precision 86.247 
Z Recall % Fl score % 
89 83 70 
IoU % mAp % Iteration 
70.49 87.89 40,000 


4.2. Testing evaluation for classification 

The trained module was then manually tested over 2,000 images for the classification and the results 
are compared. Tables 2 and 3 show the confusion matrix of YOLOv3 and YOLOv5S algorithms. YOLOv3 
algorithm was trained on 20 classes (0-9 red and 0-9 green) while the YOLOv5 algorithm was trained on 12 
classes (0-9 without specifying the colour and then red and green from the colour of the traffic light bulb or 
arrow). The accuracy can be obtained using (3). 


Accuracy = — (3) 
TP+FP+TN+FN 
Table 2. YOLOv3 confusion matrix for red and green numbers 
Colour 
Red Green 
True positive False negative __Recall% | Accuracy% True positive False negative __Recall% | Accuracy% 
1 100 0 100 100 100 0 100 100 
2 100 0 100 100 100 0 100 100 
3 97 3 97 97 100 0 100 100 
5 4 100 0 100 100 100 0 100 100 
< 3 98 2 98 98 95 5 95 95 
5 6 100 0 100 100 100 0 100 100 
tae 99 1 99 99 100 0 100 100 
8 100 0 100 100 100 0 100 100 
9 100 0 100 100 100 0 100 100 
0 98 2 92 92 97 3 97 97 
Total true positive Total false negative 
1,984 16 
Average recall% Average accuracy % 
99.2 99.2 
Table 3. YOLOvS5 confusion matrix 
Parameters 
True positive False negative _Recall% | Accuracy% 
1 93 7 93 93 
2 97 3 97 97 
3 91 9 91 91 
. 4 96 4 96 96 
o 5 93 7 93 93 
5 6 100 0 100 100 
7 100 0 100 100 
8 100 0 100 100 
9 100 0 100 100 
0 100 0 100 100 
Green 100 0 100 100 
Red 100 0 100 100 
Total true positive Total false negative 
1,170 30 
Average recall% Average accuracy % 
97.5 97.5 
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4.3. Traffic light counter with bounding box 

The module was then tested for the detection of the numbers shown on the traffic light counter to 
obtain the average confidence rate of the system. A sample of some images with bounding boxes and the 
confidence rates are given in Figure 3. Meanwhile, Figures 4 and 5 show the average confidence rate of the 
trained YOLOv3 algorithm after testing 200 images (10 images for each class) with good result of detection 
and confidence rate. Consequently, the module was tested on a Nvidia Jetson Nano Kit (JN) to evaluate the 
frames per second performance of both the JN and the algorithm. Additionally, Table 4 and Figure 6 show the 
average frames per second on the Nvidia Jetson Nano Kit versus the Tesla T4 cloud GPU by Google Colab. 
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Figure 5. Confidence rate for green numbers for YOLOv3 


Table 4. Average frames per second 
GPU algorithm Tesla T4 Nvidia Jetson Nano 
YOLOv3 12 Frames per second _2.5 Frames per second 
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Average Frames Per Second 


FPS 


YOLOv3 
GPU 
+ Tesla T4 « Nvidia Jetson Nano 


Figure 6. YOLOv3 FPS on different GPUs 


5. CONCLUSION 

In conclusion, the YOLOv3 algorithm was successfully tested with the dataset collected around the 
city of Melaka Malaysia. A total of 2204 images were split into 80% for training and 20% for testing and have 
been labelled via CVAT and trained via Google Colab. The system was able to detect traffic light counter and 
classifies the numbers (0-9) and its colour (red or green). The results for accuracy and recall are at 99.2%, 
precision is at 89% intersection over union (IoU) is at 70.49% and mAp is at 87.89%. The YOLOv5S had been 
tested and compared with the results shown are not very far from each other in terms of accuracy and reliability. 
However, YOLOv5S has some limitations in terms of compatibility with the Nvidia Jetson Nano Kit as it cannot 
be deployed on it. Moreover, YOLOv3 is lighter and has fewer layers; thus, it should have better FPS results 
in both the Jetson Nano and personal computer. 
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